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Smoothed Analysis of Algorithms 

Daniel A. Spielman* Shang-Hua Teng^ 



Abstract 

Spielman and Teng [STOC '01] introduced the smoothed analysis of al- 
gorithms to provide a framework in which one could explain the success in 
practice of algorithms and heuristics that could not be understood through 
the traditional worst-case and average-case analyses. In this talk, we survey 
some of the smoothed analyses that have been performed. 
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1. Introduction 

The most common theoretical approach to understanding the behavior of al- 
gorithms is worst-case analysis. In worst-case analysis, one proves a bound on 
the worst possible performance an algorithm can have. A triumph of the Algo- 
rithms community has been the proof that many algorithms have good worst-case 
performance — a strong guarantee that is desirable in many applications. However, 
there are many algorithms that work exceedingly well in practice, but which are 
known to perform poorly in the worst-case or lack good worst-case analyses. In 
an attempt to rectify this discrepancy between theoretical analysis and observed 
performance, researchers introduced the average-case analysis of algorithms. In 
average-case analysis, one bounds the expected performance of an algorithm on 
random inputs. While a proof of good average-case performance provides evidence 
that an algorithm may perform well in practice, it can rarely be understood to ex- 
plain the good behavior of an algorithm in practice. A bound on the performance of 
an algorithm under one distribution says little about its performance under another 
distribution, and may say little about the inputs that occur in practice. Smoothed 
analysis is a hybrid of worst-case and average-case analyses that inherits advantages 
of both. 
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In the formulation of smoothed analysis used in |27j , we measure the maximum 
over inputs of the expected running time of a simplex algorithm under slight random 
perturbations of those inputs. To see how this measure compares with worst-case 
and average-case analysis, let X n denote the space of linear-programming problems 
of length n and let T(x) denote the running time of the simplex algorithm on input 
x. Then, the worst-case complexity of the simplex algorithm is the function 

C W orst(n) = max T(x), 

xGX n 

and the average-case complexity of the algorithm is 

Cave(n) = E re x n T(r-), 

under some suitable distribution on X n . In contrast, the smoothed complexity of 
the simplex algorithm is the function 

C S rnooth(n,cr) = maxE rE x n T(i + a \\x\\ r), 

X 

where r is chosen according to some distribution, such as a Gaussian. In this case 
a 1 1 a; 1 1 r is a Gaussian random vector of standard deviation <t||x||. We multiply by 
1 1 a; 1 1 so that we can relate the magnitude of the perturbation to the magnitude of 
that which it perturbs. 

In the smoothed analysis of algorithms, we measure the expected performance 
of algorithms under slight random perturbations of worst-case inputs. More for- 
mally, we consider the maximum over inputs of the expected performance of algo- 
rithms under slight random perturbations of those inputs. We then express this 
expectation as a function of the input size and the magnitude of the perturbation. 
While an algorithm with a good worst-case analysis will perform well on all inputs, 
an algorithm with a good smoothed analysis will perform well on almost all inputs 
in every small neighborhood of inputs. Smoothed analysis makes sense for algo- 
rithms whose inputs are subject to slight amounts of noise in their low-order digits, 
which is typically the case if they are derived from measurements of real-world phe- 
nomena. If an algorithm takes such inputs and has a good smoothed analysis, then 
it is unlikely that it will encounter an input on which it performs poorly. The name 
"smoothed analysis" comes from the observation that if one considers the running 
time of an algorithm as a function from inputs to time, then the smoothed com- 
plexity of the algorithm is the highest peak in the plot of this function after it is 
convolved with a small Gaussian. 

In our paper introducing smoothed analysis, we proved that the simplex method 
has polynomial smoothed complexity p7| . The simplex method, which has been 
the most popular method of solving linear programs since the late 1940's, is the 
canonical example of a practically useful algorithm that could not be understood 
theoretically. While it was known to work very well in practice, contrived examples 
on which it performed poorly proved that it had horrible worst-case complexity |l9|, 
" 20| [l4 , 13, f|, [l7[ |f. The average-case complexity of the simplex method was proved 



to be polynomial [||, 0, [lj, §, |j| , but this result was not considered to explain 



the performance of the algorithm in practice. 
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2. The simplex method for linear programming 

We recall that a linear programming problem can be written in the form 

T 

maximize x c 

subject to x T on < hi, for 1 < i < n, (2-1) 

where c S R d , cij S R d and tjEK, for 1 < i < n In j27), we bound the smoothed 
complexity of a particular two-phase simplex method that uses the shadow-vertex 
pivot rule to solve linear programs in this form. 

We recall that the constraints of the linear program, that x 1 at < 6j, confine 
a; to a (possibly open) polytope, and that the solution to the linear program is a 
vertex of this polytope. Simplex methods work by first finding some vertex of the 
polytope, and then walking along the 1-faces of the polytope from vertex to vertex, 
improving the objective function at each step. The pivot rule of a simplex algorithm 
dictates which vertex the algorithm should walk to when it has many to choose from. 
The shadow-vertex method is inspired by the simplicity of the simplex method in 
two-dimensions: in two-dimensions, the polytope is a polygon and the choice of 
next vertex is always unique. To lift this simplicity to higher dimensions, the 
shadow-vertex simplex method considers the orthogonal projection of the polytope 
defined by the constraints onto a two-dimensional space. The method then walks 
along the vertices of the polytope that are the pre-images of the vertices of the 
shadow polygon. By taking the appropriate shadow, it is possible to guarantee that 
the vertex optimizing the objective function will be encountered during this walk. 
Thus, the running time of the algorithm may be bounded by the number of vertices 
lying on the shadow polygon. Our first step in proving a bound on this number is 
a smoothed analysis of the number of vertices in a shadow. For example, we prove 
the bound: 

Theorem 2.1 (Shadow Size) Let d > 3 and n > d. Let c and t be independent 
vectors in R d , and let a±,...,a n be Gaussian random vectors in M d of variance 
a 1 < 9d jj^ - centered at points each of norm at most 1 . Then, the expected number 

of vertices of the shadow polygon formed by the projection of {x : x T ai < l} onto 
Span (t, c) is at most 

58,888,678 nd 3 

This bound does not immediately lead to a bound on the running time of a 
shadow-vertex method as it assumes that t and c are fixed before the a^s are chosen, 
while in a simplex method the plane on which the shadow is followed depends upon 
the ats. However, we are able to use the shadow size bound as a black-box to prove 
for a particular randomized two-phase shadow vertex simplex method: 

Theorem 2.2 (Simplex Method) Let d > 3 and n > d + 1. Let c E H d 
and b G { — 1, 1}"- Let oi, . . . , a n be Gaussian random vectors in M d of variance 
a 1 < 9d - centered at points each of norm at most 1 . Then the expected number 
of simplex steps taken by the two-phase shadow-vertex simplex algorithm to solve 
the program specified by b, c, and a\, . . . , a n is at most 

(nd/afM, 
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where the expectation is over the choice of oi, . . . , a n and the random choices made 
by the algorithm. 

While the proofs of Theorems 2.1 and 2.2 are quite involved, we can provide the 
reader with this intuition for Theorem 2.1: after perturbation, most of the vertices 
of the polytope defined by the linear program have an angle bounded away from 
flat. This statement is not completely precise because "most" should be interpreted 
under a measure related to the chance a vertex appears in the shadow, as opposed 
to counting the number of vertices. Also, there are many ways of measuring high- 
dimensional angles, and different approaches are used in different parts of the proof. 
However, this intuitive statement tells us that most vertices on the shadow polygon 
should have angle bounded away from flat, which means that there cannot be too 
many of them. 

One way in which angles of vertices are measured is by the condition number 
of their defining equations. A vertex of the polytope is given by a set of equations 
of the form 

Cx = b. 

The condition number of C is defined to be 

«(co = ||c||||c- 1 ||, 



where we recall that 



and that 



|C|| = max 



IC. 



x 



\Cx\\ 



C = min ■ 

" " x ||x| 

The condition number is a measure of the sensitivity of x to changes in C and b, and 
is also a normalized measure of the distance of C to the set of singular matrices. For 
more information on the condition number of a matrix, we refer the reader to one 
of [ p"5[ p9| , [To| | . Condition numbers play a fundamental role in Numerical Analysis, 
which we will now discuss. 



3. Smoothed complexity framework for numerical 
analysis 

The condition number of a problem instance is generally defined to be the 
sensitivity of the output to slight perturbations of the problem instance. In Numer- 
ical Analysis, one often bounds the running time of an iterative algorithm in terms 
of the condition number of its input. Classical examples of algorithms subject to 
such analyses include Newton's method for root finding and the conjugate gradient 
method of solving systems of linear equations. For example, the number of itera- 
tions taken by the method of conjugate gradients is proportional to the square root 
of the condition number. Similarly, the running times of interior-point methods 
have been bounded in terms of condition numbers [122(1 - 
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Blum j(| suggested that a complexity theory of numerical algorithms should 
be parameterized by the condition number of an input in addition to the input size. 
Smale [ p6| proposed a complexity theory of numerical algorithms in which one: 

1. proves a bound on the running time of an algorithm solving a problem in terms 
of its condition number, and then 

2. proves that it is unlikely that a random problem instance has large condition 
number. 

This program is analogous to the average-case complexity of Theoretical Computer 
Science and hence shares the same shortcoming in modeling practical performance 
of numerical algorithms. 

To better model the inputs that occur in practice, we propose replacing step 
2 of Smale's program with 

2'. prove that for every input instance it is unlikely that a slight random pertur- 
bation of that instance has large condition number. 

That is, we propose to bound the smoothed value of the condition number. In 
contrast with the average-case analysis of condition numbers, our analysis can be 
interpreted as demonstrating that if there is a little bit of imprecision or noise in the 
input, then it is unlikely it is ill-conditioned. The combination of step 2' with step 1 
of Smale's program provides a simple framework for performing smoothed analysis of 
numerical algorithms whose running time can be bounded by the condition number 
of the input. 



4. Condition numbers of matrices 

One of the most fundamental condition numbers is the condition number of 
matrices defined at the end of Section 2. In his paper, "The probability that a 
numerical analysis problem is difficult" , Demmel § proved that it is unlikely that 
a Gaussian random matrix centered at the origin has large condition number. Dem- 
mel's bounds on the condition number were improved by Edelman |jl2f . As bounds 
on the norm of a random matrix are standard, we focus on the norm of the inverse, 
for which Edelman proved: 

Theorem 4.1 (Edelman) Let G be a d-by-d matrix of independent Gaussian 
random variables of variance 1 and mean 0. Then, 

Pr [KG- 1 ! >t] <^. 

We obtain a smoothed analogue of this bound in work with Sankar . That 
is, we show that for every matrix it is unlikely that the slight perturbation of that 
matrix has large condition number. The key technical statement is: 

Theorem 4.2 (Sankar-Spielman-Teng) Let A be an arbitrary d-by-d Real ma- 
trix and A a matrix of independent Gaussian random variables centered at A, each 
of variance a 2 . Then 

Pr WlA-H >x}< 1.823— 
1 1 " xa 
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In contrast with the techniques used by Demmel and Edelman, the techniques 
used in the proof of Theorem 4.2 are geometric and completely elementary. We now 
give the reader a taste of these techniques by proving the simpler: 

Theorem 4.3 Let A be an arbitrary d-by-d Real matrix and A a matrix of 
independent Gaussian random variables centered at A, each of variance a 2 . Then, 

Pr [H^r 1 !! >x] <d 3/2 /xa. 

The first step of the proof is to relate ||^4 _1 || to a geometric quantity of the 
vectors in the matrix A. The second step is to bound the probability of a configura- 
tion under which this geometric quantity is small. The geometric quantity is given 
by: 

Definition For d vectors in H d , a±, . . . , ad, define 



height (ai, . . . , ad) — mindist (a^, Span (a\, . . . ,di, . 

i 

Lemma 4.5 For d vectors in H d , a\, . . . , ad, 

|| (ai,..., ad)' 1 1| < Vd/height (A) . 
Proof. Let t be a unit vector such that 



■ ,a d )) ■ 



i=i 



= 1/ |l(oi, •• -,ad) 



Without loss of generality, let t\ be the largest entry of t in absolute value, so 
|*i I > 1/Vd. Then, we have 



a\ + y^(*j/*i)a } 



< Vn/\\(ai,...,ad) x | 



dist(ai,Span(a 2 ,...,a d )) < Vd/ \\ (ai, . . . , a d ) 1 \\ ■ 4 

Proof of Theorem 4.3. Let ai, . . . , a<j denote the columns of A. Lemma 4.5 
tells us that if ||^4 _1 || > x, then height (ai, . . . , a^) is less than Vd/x. For each 
i, the probability that the height of ai above Span (at, . . . , a,, . . . , a,;) is less than 
\fdjx is at most 

Vd/xa. 



Thus, 



Pr 



height (ai, . . . , a<*) < Vd/x 



< Pr 



3i : dist (a^, Span (ai, . . . , di, . . . , a n )) < Vd/a 



so, 



< n 3/2 /xcr; 

Pr [|| (oi, . . . , ad)- 1 1| > x] < d 3/2 /xa. 



4 
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Conjecture 1 Let A be an arbitrary d-by-d Real matrix and A a matrix of 
independent Gaussian random variables centered at A, each of variance a 2 . Then 

Pr NlA- 1 !! >x] < — . 

5. Smoothed condition numbers of linear programs 

The Perceptron algorithm solves linear programs of the following simple form: 

Given a set of points a\, . . . ,a n , find a vector x such that (ai\x) > 
for all i, if one exists. 

One can define the condition number of the Perceptron problem to be the reciprocal 
of the "wiggle room" of the input. That is, let S — {x : (di\x) > 0,Vi} and 

, « / . \(ai\x)\ 

v{a\, . . . , a n ) = max mm 



ieS y i \\a,i\\ \\x\\ 

Then, the condition number of Perceptron problem is defined to be l/i/(ai, . . . , a n ). 

The Perceptron algorithm works as follows: (1) Initialize x = 0; (2) Select any 
dj such that (di\x) < and set x = x + aij ||aj||; (3) while x ^ S, go back to step 
(2). 

Using the following two lemmas, Blum and Dunagan |^| obtained a smoothed 
analysis of the Perceptron algorithm. 

Theorem 5.1 (Block-Novikoff) On input oi, . . . , a n , the perceptron algorithm 
terminates in at most 1/ (v(ai, . . . , a„)) 2 iterations. 

Theorem 5.2 (Blum-Dunagan) Let a±, . . . ,a n be Gaussian random vectors in 
M d of variance a 2 < X/(2d) centered at points each of norm at most 1. Then, 



Pr 



1 

> t 



nd 15 , at 

< log . 

" at & d 15 



Setting t — — — i °s( n / s ) j Bi um anc j Dunagan concluded 



v(ai, ■ ■ ■ ,a n ) 
— , Blum anc 

Theorem 5.3 (Blum-Dunagan) Let a±,...,a n be Gaussian random vectors 
in H d of variance a 2 < l/(2d) centered at points each of norm at most 1. Then, 
there exists a constant c such that the probability that the perceptron takes more 

than cd " j 1 "^"^^ iterations is at most S. 

In his seminal work, Renegar |2ll |2^| defines the condition of a linear 
program to be the normalized reciprocal of its distance to the set of ill-posed linear 
programs, where an ill-posed program is one that can be made both feasible and 
infeasible or bounded and unbounded by arbitrarily small changes to its constraints. 

Renegar proved the following theorem. 

Theorem 5.4 (Renegar) There is an interior point method such that, on input 
a linear program specified by (A, b, c) and an e > ; it will terminate in 



Q(Vn + dlog(«(A, b, c)/e) 
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iterations and return either a solution within e of the optimal or a certificate that 
the linear program is infeasible or unbounded. 

With Dunagan, we recently proved the following smoothed bound on the con- 
dition number of a linear program JlT| : 

Theorem 5.5 (Dunagan-Spielman-Teng) For any a 2 < l/(nd), let A — 
(di,...,a n ) be a set of Gaussian random vectors in lR rf of variance a 2 centered 
at points a\, . . . ,a n , let b be a Gaussian random vector in M d of variance a 2 cen- 
tered at b and let c be a Gaussian random vector in R™ of variance a 2 centered at 



c such that ll^ll^ + INI + ll c l| 2 — !• Th 



en 



Pr AAc [C(A, b, c)>t}< — log 2 — , 

and hence 

E AAc [logC(A,b,c)] < 21 + 31og(nd/<7). 
Combining these two theorem, we have 

Theorem 5.6 (Smoothed Complexity of Interior Point Methods) Let a and 
(A, b, c) be as given in Theorem 5.5, Then, Renegar's interior point method solves 
the linear program specified by (A, 6, c) to within precision e in expected 

O (V" + d(21 + 3 log(nd/ae)) 

iterations. 



6. Two open problems 

As the norm of the inverse of matrix is such a fundamental quantity, it is 
natural to ask how the norms of the inverses of the Q) d-by-d square sub-matrices 
of a d-by-n matrix behave. Moreover, a crude bound on the probability that many 
of these are large is a dominant term in the analysis of complexity of the simplex 
method in J27j. The bound obtained in that paper is: 

Lemma 6.1 Let ai,...,a n be Gaussian random vectors in H d of variance 
a 2 < l/9d\ogn centered at points of norm at most 1. For I £ (^) a d-set, let Xj 
denote the indicator random variable that is 1 if 



[ai-.iel] 1 



> 



8d 3 /2„7 ■ 



Then, 



Pr„ 



■5>< 



n 

d-1 



> 1 



n -d _ n -n+d-l _ n -2.9d+l_ 



Clearly, one should be able to prove a much stronger bound than that stated 
here, and thereby improve the bounds on the smoothed complexity of the simplex 
method. 
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While much is known about the condition numbers of random matrices drawn 
from continuous distributions, much less is known about matrices drawn from dis- 
crete distributions. We conjecture: 

Conjecture 2 Let Abe a d-by-d matrix of independently and uniformly chosen 
±1 entries. Then, 



for some absolute constant a < 1. 

We remark that the case t — oo, when the matrix A is singular, follows from 
a theorem of Kahn, Komlos and Szemeredi . 
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